Tuning Amazon EMR cluster for best performance with Ideata
Enabling CPU Scheduling
CPU scheduling is not enabled by default. To enable the CPU Scheduling, set the following property in the /etc/hadoop/conf/capacity-scheduler.xml file on the ResourceManager and NodeManager hosts:
Replace the DefaultResourceCalculator with the DominantResourceCalculator.
Property:yarn.scheduler.capacity.resource-calculator
Value:org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
1.login to your emr master instance machine and update the follwong properties to /etc/hadoop/conf/capacity-scheduler.xml file
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<!-- <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value> -->
<value>org.apache.hadoop.yarn.util.resource.``DominantResourceCalculator``</value>
</property>
- Restart your emr cluster by follwing command -
sudo reboot
Configure EMR FS file
The EMR File System (EMRFS) and the Hadoop Distributed File System (HDFS) are both installed on your EMR cluster.
EMRFS is an implementation of HDFS which allows EMR clusters to store data on Amazon S3.
EMRFS will try to verify list consistency for objects tracked in its metadata for a specific number of retries. The default is 5. In the case where the number of retries is exceeded the originating job returns a failure. To overcome this issue you can override your default emrfs configuration in the following steps:
Step1: Login your EMR-master machine
Step2: Add following properties to /usr/share/aws/emr/emrfs/conf/emrfs-site.xml
sudo vi /usr/share/aws/emr/emrfs/conf/emrfs-site.xml
<property>
<name>fs.s3.consistent.throwExceptionOnInconsistency</name>
<value>false</value>
</property>
<property>
<name>fs.s3.consistent.retryPolicyType</name>
<value>fixed</value>
</property>
<property>
<name>fs.s3.consistent.retryPeriodSeconds</name>
<value>10</value>
</property>
<property>
<name>fs.s3.consistent</name>
<value>false</value>
</property>
Your emrfs-site.xml file look like this
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.s3.consistent.throwExceptionOnInconsistency</name>
<value>false</value>
</property>
<property>
<name>fs.s3.consistent.retryPolicyType</name>
<value>fixed</value>
</property>
<property>
<name>fs.s3.consistent.retryPeriodSeconds</name>
<value>10</value>
</property>
<property>
<name>fs.s3.consistent</name>
<value>false</value>
</property>
</configuration>